Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add delay for direct S3 upload of tiny files #6931

Merged
merged 9 commits into from Jun 5, 2020

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented May 22, 2020

What this PR does / why we need it: slows down reporting uploads of time files to avoid concurrency issue(s) in Dataverse (see issue)

Which issue(s) this PR closes:

Closes #6930

Special notes for your reviewer: the delay is currently 500 msec per file, could be made longer if the problem persists. In my testing 200 msec was too short (still saw issues sometimes) so it can't get shorter than that. FWIW, prior updates should have made sure that there is only one call from the fileupload.js to Dataverse at a time, so I'm assuming that there's some processing in Dataverse going on after the HTTP call returns, during which a new call still causes problems. If there are other ways to handle this, I'd be interested to know what else might be done. (This should be effective, but it would be nice not to just have an arbitrary delay.)

Suggestions on how to test this: same as for previous tests - with < 1K files (26 bytes is fine)

Does this PR introduce a user interface change?: No

Is there a release notes update needed for this change?: No

Additional documentation: None

@qqmyers qqmyers changed the title add delay for tiny files add delay for direct S3 upload of tiny files May 22, 2020
@coveralls
Copy link

coveralls commented May 22, 2020

Coverage Status

Coverage decreased (-0.05%) to 19.585% when pulling 5f3821e on GlobalDataverseCommunityConsortium:IQSS/6930 into 91ddb11 on IQSS:develop.

IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Code Review 🦁 to QA 🔎✅ May 26, 2020
@kcondon kcondon assigned kcondon and qqmyers and unassigned kcondon May 26, 2020
@kcondon kcondon moved this from QA 🔎✅ to Community Dev 💻❤️ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 26, 2020
@kcondon
Copy link
Contributor

kcondon commented May 26, 2020

Still seeing null ptr/index out of bounds on 25-100 small file uploads, sometimes completes with one file missing, sometimes halts part way. Had experimented with bumping up timeout to 2000ms but it did not solve the problem.

@qqmyers
Copy link
Member Author

qqmyers commented Jun 5, 2020

@kcondon - ready for another/last try - I found a place where the code wasn't waiting before requesting new upload URLs, which could then cause the concurrency/state issue as before. I can definitely see a slow-down in the rate for small files (where the upload and hash times are negligible). Given that the other changes to stop concurrent things seemed to reduce the occurrences for me, I'm hopeful that this fix will hit the ones you're seeing.

@djbrooke djbrooke moved this from Community Dev 💻❤️ to QA 🔎✅ in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Jun 5, 2020
@kcondon kcondon self-assigned this Jun 5, 2020
@kcondon kcondon merged commit 875ca66 into IQSS:develop Jun 5, 2020
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA 🔎✅ to Done 🚀 Jun 5, 2020
@djbrooke djbrooke added this to the Dataverse 5 milestone Jun 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

File upload: Direct S3 upload has intermittent failures when uploading larger number of files.
5 participants